Audio signal processing method and apparatus

ABSTRACT

A method of processing a series of input audio signals representing a series of virtual audio sound sources placed at predetermined positions around a listener to produce a reduced set of audio output signals for playback over speaker devices placed around a listener, the method comprising the steps of: (a) for each of the input audio signals and for each of the audio output signals: (i) convolving the input audio signals with an initial head portion of a corresponding impulse response mapping substantially the initial sound and early reflections for an impulse response of a corresponding virtual audio source to a corresponding speaker device so as to form a series of initial responses; (b) for each of the input audio signals and for each of the audio output signals: (i) forming a combined mix from the audio input signals; and (ii) forming a combined convolution tail from the tails of the corresponding impulse responses; (iii) convolving the combined mix with the combined convolution tail to form a combined tail response; (c)for each of the audio output signals: (i) combining a corresponding series of initial responses and a corresponding combined tail response to form the audio output signal.

FIELD OF THE INVENTION

The present invention relates to the field of audio signal processingand, in particular, discloses efficient convolution methods for theconvolution of input audio signals with impulse response functions orthe like.

BACKGROUND OF THE INVENTION

In International PCT Application No. PCT/AU93/00330 entitled “DigitalFilter Having High Accuracy and Efficiency” filed by the presentapplicant, there is disclosed a process of convolution which has anextremely low latency in addition to allowing for effective longconvolution of detailed impulse response functions.

It is known to utilize the convolution of impulse response functions toadd “color” to audio signals so that when, for example, playback overheadphones, the signals provide for an “out of head” listeningexperience. Unfortunately, the process of convolution, whilst utilizingadvanced algorithmic techniques such as the fast fourier transform(FFT), often requires excessive computational time. The computationalrequirements are often increased when multiple channels must beindependently convolved as is often the case when full surround soundcapabilities are required. Modem DSP processors are often unable toprovide for the resources for full convolution of signals, especiallywhere real time restrictions are placed on the latency of theconvolution.

Hence, there is a general need to reduce the processing requirements ofa full convolution system whilst substantially maintaining the overallquality of the convolution process.

SUMMARY OF THE INVENTION

In accordance with a first aspect of the present invention, there isprovided a method of processing a series of input audio signalsrepresenting a series of virtual audio sound sources placed atpredetermined positions around a listener to produce a reduced set ofaudio output signals for playback over speaker devices placed around alistener, the method comprising the steps of: (a) for each of the inputaudio signals and for each of the audio output signals: (i) convolvingthe input audio signals with an initial head portion of a correspondingimpulse response mapping substantially the initial sound and earlyreflections for an impulse response of a corresponding virtual audiosource to a corresponding speaker device so as to form a series ofinitial responses; (b) for each of the input audio signals and for eachof the audio output signals: (i) forming a combined mix from the audioinput signals; and (ii) determining a single convolution tail; (iii)convolving the combined mix with the single convolution tail to form acombined tail response; (c) for each of the audio output signals: (i)combining a corresponding series of initial responses and acorresponding combined tail response to form the audio output signal.

The single convolution tail can be formed by combining the tails of thecorresponding impulse responses. Alternatively, the single convolutiontail can be a chosen one of the virtual speaker tail impulse responses.Ideally, the method further comprises the step of preprocessing theimpulse response functions by: (a) constructing a set of correspondingimpulse response functions; (b) dividing the impulse response functionsinto a number of segments; (c) for a predetermined number of thesegments, reducing the impulse response values at the ends of thesegments.

The input audio signals are preferably translated into the frequencydomain and the convolution can be carried out in the frequency domain.The impulse response functions can be simplified in the frequency domainby zeroing higher frequency coefficients and eliminating multiplicationsteps where the zeroed higher frequency coefficients are preferablyutilized.

The convolutions are preferably carried out utilizing a low latencyconvolution process. The low latency convolution process preferably caninclude the steps of: transforming first predetermined block sizedportions of the input audio signals into corresponding frequency domaininput coefficient blocks; transforming second predetermined block sizedportions of the impulse responses signals into corresponding frequencydomain impulse coefficient blocks; combining the each of the frequencydomain input coefficient blocks with predetermined ones of thecorresponding frequency domain impulse coefficient blocks in apredetermined manner to produce combined output blocks; adding togetherpredetermined ones of the combined output blocks to produce frequencydomain output responses for each of the audio output signals;transforming the frequency domain output responses into correspondingtime domain audio output signals; outputting the time domain audiooutput signals.

In accordance with a further aspect of the present invention, there isprovided a method of processing a series of input audio signalsrepresenting a series of virtual audio sound sources placed atpredetermined positions around a listener to produce a reduced set ofaudio output signals for playback over speaker devices placed around alistener, the method comprising the steps of: (a) forming a series ofimpulse response functions mapping substantially a corresponding virtualaudio source to a corresponding speaker device; (b) dividing the impulseresponse functions into a number of segments; (c) for a predeterminednumber of the segments, reducing the impulse response values at the endsof the segment to produce modified impulse responses; (d) for each ofthe input audio signals and for each of the audio output signals: (i)convolving the input audio signals with portions of a correspondingmodified impulse response mapping substantially a corresponding virtualaudio source to a corresponding speaker device.

In accordance with a further aspect of the present invention, there isprovided a method for providing for the simultaneous convolution ofmultiple audio signals representing audio signals from different firstsound sources, so as to simulate an audio environment for projectionfrom a second series of output sound sources comprising the steps of:(a) independently filtering each of the multiple audio signals with afirst initial portion of an impulse response function substantiallymapping the first sound sources when placed in the audio environment:and (b) providing for the combined reverberant tail filtering of themultiple audio signals with a reverberant tail filter formed fromsubsequent portions of the impulse response functions.

The filtering can occur via convolution in the frequency domain and theaudio signals are preferably first transformed into the frequencydomain. The series of input audio signals can include a left frontchannel signal, a right front channel signal, a front centre channelsignal, a left back channel signal and a right back channel signal. Theaudio output signals can comprise left and right headphone outputsignals.

The present invention can be implemented in a number of different ways.For example, utilising a skip protection processor unit located inside aCD-ROM player unit, utilising a dedicated integrated circuit comprisinga modified form of a digital to analog converter; utilising a dedicatedor programmable Digital Signal Processor; or utilizing a DSP processorinterconnected between an Analog to Digital Convener and a Digital toAnalog Convener. Alternatively, the invention can be implemented using aseparately detachable external device connected intermediate of a soundoutput signal generator and a pair of headphones, the sound outputsignals being output in a digital form for processing by the externaldevice.

Further modifications can include utilizing a variable control to alterthe impulse response functions in a predetermined manner.

BRIEF DESCRIPTION OF THE DRAWINGS

Notwithstanding any other forms which may fall within the scope of thepresent invention, preferred forms of the invention will now bedescribed, by way of example only, with reference to the accompanyingdrawings in which:

FIG. 1 illustrates schematically the overall convolution processing formapping a series of signals to two headphone output channels;

FIG. 2 illustrates the traditional overlap and save FFT process;

FIG. 3 illustrates the low latency process utilized in the preferredembodiment;

FIG. 4 is a schematic illustration of the generic frequency domainconvolution process;

FIG. 5 illustrates a first simplification of the process of FIG. 4;

FIG. 6 illustrates the idealized processing of a series of input signalsto the left ear of a headphone;

FIG. 7 illustrates a first simplification of the processing requirementsof FIG. 6;

FIG. 8 illustrates in more detail a frequency domain implementation ofthe arrangement of FIG. 7 utilizing low latency convolution;

FIG. 9 illustrates the standard synthesis process for deriving frequencydomain coefficients;

FIG. 10 illustrates a modified form of frequency domain coefficientgeneration;

FIG. 11 illustrates the extension of the preferred embodiment to higherfrequency audio data;

FIG. 12 illustrates an implementation whereby the audio processingcircuitry is utilized in place of the skip-protection function of anexisting CD player;

FIG. 13 illustrates an implementation whereby the audio processingcircuitry is utilized within the same IC package as a Digital to Analogconverter;

FIG. 14 illustrates an implementation whereby the audio processingcircuitry is utilized in the signal chain before the Digital to Analogconverter;

FIG. 15 illustrates an implementation whereby the audio processingcircuitry is utilized in a configuation along with Analog to Digital andDigital to Analog converters;

FIG. 16 illustrates an extension to the circuit of FIG. 17 to include anoptional digital input; and

FIG. 17 illustrates a number possible physical embodiments of thecurrent invention.

DESCRIPTION OF PREFERRED AND OTHER EMBODIMENTS

In the preferred embodiment, it is desired to approximate the full longconvolution of a series of input signals with impulse response functionsfor each ear such that the outputs can be summed to the left and rightears for playback over headphones.

Turning to FIG. 1, there is illustrated the full convolution process fora 6 input Dolby surround sound set of signals comprising a left frontcentre front, right front, left surround, right surround and lowfrequency effects channels each indicated generically 2. For eachchannel, a left and right impulse response function is applied. Hence,for the left front channel 3, a corresponding left front impulseresponse function 4 is convolved 6 with the left signal. The left frontimpulse response function 4 is an impulse response that would bereceived by the left ear to an idealized spike output from a left frontchannel speaker placed in an idealized position. The output 7 is summed10 to the left channel signal for the headphone.

Similarly, the corresponding impulse response 5 for the right ear for aleft channel speaker is convolved 8 with the left front signal toproduce an output 9 which is summed II to the right channel. A similarprocess occurs for each of the other signals.

Hence, the arrangement of FIG. 1 will require approximately 12convolution steps for the six input signals. Such a large number ofconvolutions is likely to be quite onerous for a DSP chip especiallywhere the desired long convolution is used.

Turning now to FIG. 2, there is illustrated the standard “overlap andsave” convolution process which is as fully set out in the standard textsuch as “Digital Signal Processing”, by John Proakis and DimitisManolakis, McMillan Publishing Company, 1992.

In the traditional overlap and save method as illustrated 20 in FIG. 2,the input signal 21 is digitized and divided into N sample blocks 22with N normally being a power of 2. Similarly, an impulse response oflength N 23 is determined normally by taking desired environmentalmeasurements, and padded to length 2N by zeros. A first mapping 24 isapplied to the 2N blocks of the impulse response 23 so as to form Ncomplex numbers having real and imaginary coefficients. The FFT is thenapplied to produce N Frequency coefficients. The step 24 can be carriedout once before processing begins and the corresponding frequency domaincoefficients 25 stored for later use.

Next, blocks of length 2N of the input audio are taken and again a fastfourier transform is applied so as to determine corresponding frequencydomain data 28 corresponding to the 2N real input values. Next, the twosets of data are element-by-element multiplied 30 so as to producefrequency domain data 31. An inverse fourier transform is then appliedto produce 2N real values with the first N 34 being discarded and thesecond N 35 becoming the output values 36 for the output audio. Theprocess illustrated in FIG. 2 is well known as a standard frequencydomain convolution process. Unfortunately however, due to requirementfor input data to be collected into blocks, and due to the FFT processtaking a finite time, dependent on the value of N, (the processing timeis O(NlogN)), a certain latency or delay exists from when an initial 2Ninput values are input to the first FFT 27 and until the time ofsubsequent output from the inverse FFT 32. This delay or latency isoften highly undesirable especially where real time requirements must bemet.

In the aforementioned PCT Application No. PCT/AU93/00330 there wasdisclosed a method for providing for an extremely low latencyconvolution process suitable for real time usage. Whilst the reader isreferred to the aforementioned PCT specification, a short discussion ofthe low latency process will now be set out with reference to FIG. 3which illustrates 40 the basic steps of the low latency process whereinan audio input is first converted to the frequency domain by an FFTfrequency domain overlap process 41. The frequency domain data is stored42 and then passed on to subsequent storage blocks eg. 43, 44 after eachconvolution “cycle”. The frequency domain data eg. 42 is firstelement-by-element multiplied 50 by a corresponding frequency domaincoefficients 51 with the domain coefficients 51 corresponding to a firstportion of the impulse response function.

Simultaneously, the previously delayed frequency domain data 43 ismultiplied 54 with the frequency domain coefficients 53 corresponding toa later part of the impulse response function. This step is alsorepeated for the rest of the impulse response function. The outputs aresummed element-by-element 56 so as to produce overall frequency domaindata which is inverse fast fourier transformed with half the datadiscarded 57 so as to produce audio output 58. The arrangement of FIG. 3allows for extremely long convolution to be carried out with lowlatency.

The general process of FIG. 3 can be utilized to perform the equivalentoverall process of FIG. 1. This is illustrated in FIG. 4 wherein each ofthe six input channels eg. 60 is first converted into the frequencydomain utilizing an FFT overlap process 61. Each channel is thencombined 62 in the frequency domain with frequency domain coefficientscorresponding to the impulse response functions. The frequency domainprocess 62 can also include the adding of the frequency componentstogether to form left and right outputs 64, 65. Finally, an inversefrequency domain discard process 66, 67 is applied so as to produce leftand right channel outputs.

The required amount of computation can be substantially reduced bysimplifying the number of convolutions required.

Turning now to FIG. 5, there is illustrated one form of simplificationcarried out in the preferred embodiment wherein the centre channel 70 ismultiplied by a gain factor 71 and added 72, 73 to the left and rightchannels respectively. Similarly, portions of the low frequency effectschannel 71 are added to each of the other channels.

The signals are then subject to a fourier transform overlap processor75, 78 with the number of channels being subject to the computationallyintensive fourier transform process being reduced from 6 to 4. Next, thefourier domain process 84 is applied to produce outputs 79, 80 fromwhich an inverse fourier transform and discarding process 82, 83 isapplied to leave the left and right channels.

Turning now to FIG. 6, there is illustrated schematically the idealizedoverall end result of the process of FIG. 5 for the left ear where thefour input signals 90 are each subjected to a corresponding full lengthfinite impulse response filter 91-94 before being summed 95 to form acorresponding output signal 96. Unfortunately, in order to achieve ahigh level of realism, it is often necessary to utilize quite longimpulse response functions. For example, impulse response functions oftap length of approximately 7,000 taps are not uncommon with standard 48kHz audio signals. Again, with the arrangement of FIG. 6, excessivecomputational requirements can result with extended length filters.

An analysis of the details of the impulse response coefficients, andsome experimentation, has shown that all of the cues necessary for anaccurate localisation of the sound sources is contained within the timeof the direct and first few orders of reflections, and that the rest ofthe impulse response is only required to emphasise the “size” and“liveness” of the acoustic environment. Use can be made of thisobservation to separate the directional or “head” part of each of theresponses (say the first 1024 taps) from the reverberation or “tail”part. The “tail” parts can all be added together, and the resultingfilter can be excited with the summation of the individual inputsignals. This simplified implementation is shown schematically 100 inFIG. 7. The head filters 101 to 104 can be short 1024 tap filters andthe signals are summed 105 and fed to the extended tail filter which cancomprise approximately 6000 taps, with the results being summed 109 tobe output. This process is repeated for the right ear. The use of thecombined tail reduces computation requirements in two ways. Firstly,there is the obvious reduction in the number of terms of convolutionsums that must be computed in real time. This reduction is by a factorof the number of input channels. Secondly, the computation latency ofthe tail filter computation need only be short enough to align the firsttap of the tail filter with the last tap of each of the head filters.When block filtering implementation techniques such as overlap/add,overlap/save, or the Low Latency convolution algorithm of theaforementioned PCT application are used, this means that, optionally,larger blocks can be used to implement the tail than the heads, at alower frame rate.

Turning now to FIG. 8, there is illustrated in detail an overall flowchart of the frequency domain processing when implementing the combinedtail system of FIG. 7. The arrangement 110 of FIG. 8 is intended tooperate as a Frequency Domain Process, such as 84 in FIG. 5. The overallsystem includes summations 111 and 112 for output to the left and rightchannels respectively. The four inputs being the front left, frontright, rear left and rear right, treated symmetrically with the firstinput being stored in delay storage block 113 and also being multiplied114 with frequency domain coefficients derived from a first portion ofthe impulse response 115 with the output going to summer 111. The rightchannel is treated symmetrically to produce a right channel output andshall not be further herein discussed.

After a “cycle” the delayed block 113 is forwarded to the delay block120. It would be obvious to those skilled in the art of DSP programmingthat this can comprise a mere remapping of data block pointers. Duringthe next cycle, the coefficients 121 are multiplied 122 with the data inblock 120 with the output being forwarded to the left channel summer111. The two sets of coefficients 115, and 121 correspond to the headportion of the impulse response function. Each channel will haveindividualised head functions for the left and right output channels.

The outputs from the delay blocks 120, 125, 126 and 127 are forwarded tosummer 130 and the sum stored in delay block 131. The delay block 131and subsequent delay blocks eg. 132, 133 implement the combined tailfilter with a first segment stored in delay block 131 being multiplied136 with coefficients 137 for forwarding to the left channel sum 111. Inthe next cycle, the delay block 131 is forwarded to the block 132 and asimilar process is carried out, as is carried out for each remainingdelay block eg. 133. Again the right channel is treated symmetrically.

It will be evident from the forgoing discussion that there are a numberof impulse response functions or portions thereof used in theconstruction of the preferred embodiment. A further computationaloptimization of the process of synthesis of the frequency domaincoefficient blocks will now be discussed initially with reference toFIG. 9. In order to determine the required frequency domaincoefficients, an impulse response 140 is divided into a number ofsegments 141 of length N. Each segment is padded with an additional Nzeroed valued data values 142 before an FFT mapping to N complex data isapplied 143 so as to covert the values into N frequency domaincoefficients 144. This process can be repeated to obtain subsequentfrequency domain coefficients 145, 146, 147.

The utilization of the segmentation process of FIG. 9 can often lead toartificially high frequency domain components. This is a directconsequence of the segmentation process and its interaction with thefast fourier transform which must result in frequency components thatapproximate the discontinuity at the end data values (The FFT beingperiodic modulo the data size). The resulting FFT often has significanthigh frequency components which are present substantially as a result ofthis discontinuity. In the preferred embodiment, a process is preferablyundertaken so as to reduce the high frequency components to a pointwhere a significant amount of the computation can be discarded due to aset of zero frequency domain components. This process of making bandlimited frequency domain coefficient blocks will now be discussed withreference to FIG. 10.

The initial impulse response 150 is again divided into segments oflength N 151. Each segment is then padded to length 2N 152. The data 152is then multiplied 154 with a “windowing” function 153 which includesgraduated end portions 156, 157. The two end portions are designed tomap the ends of the data sequence 151 to zero magnitude values whilstretaining the information in between. The resulting output 159 containszero values at the points 160, 161. The output 159 is then subjected toa real FFT process to produce frequency domain coefficients 165 having anumber of larger coefficients 167 in the lower frequency domain of thefourier transform in addition to a number of negligible components 166which can then be discarded. Hence, a final partial set of frequencydomain components 169 are utilized as the frequency domain componentsrepresenting the corresponding portion of the impulse response data.

The discarding of the components 166 means that, during the convolutionprocess, only a restricted form of convolution processing need becarried out and that it is unnecessary to multiply the full set of Ncomplex coefficients as a substantial number of them are 0. This againleads to increased efficiency gains in that the computationalrequirement of the convolution process are restricted. Additionally,significant reductions in the memory requirements of the algorithm arepossible by taking advantage of the fact that the data and coefficientstorage can both be reduced as a result of this discarding ofcoefficients.

In one preferred embodiment, N is equal to 512, the head filters are1024-taps in length, and the tail filters are 6144-taps in length. Hencethe head filters are composed of two blocks of coefficients each (asillustrated in FIG. 8) and the tail filters are composed of twelveblocks of coefficients each. In this preferred embodiment, all headfilters and the first 4 blocks of each tail filter are implementedutilising complete sets of coefficient fourier transforms, the next 4blocks of each tail filter are implemented utilising coefficient blocksin which only the lower half of the frequency components are present,and the last 4 blocks of each tail filter are implemented utilisingcoefficient blocks in which only lower quarter of the frequencycomponents are present.

The preferred embodiment can be extended to the situation where higherfrequency audio inputs are utilized but it is desired to maintain a lowfrequency computational requirement. For example, it is now common inthe industry to adopt a 96 KHz sample rate for digital samples and henceit would be desirable to provide for convolution of impulse responsesalso sampled at this rate. Turning to FIG. 11, there is illustrated oneform of extension which utilizes the lower impulse response samplingrate. In this arrangement, the input signal at a rate of 96 KHz 170 isforwarded to a delayed buffer 171. The input signal is also low passfiltered 172 and then decimated 173 by factor of 2 down to a 48 KHz ratewhich is then FIR filtered in accordance with the previously outlinedsystem before the sample rate is doubled 175 followed by a low passfilter 176 wherein it is then added 177 to the originally delayed inputsignal forwarded out of delay buffer 171 which has been pre-multiplied178 by a gain factor A. The output of summer 177 forming the convolveddigital output.

Normally, if 96 KHz desired impulse response is denoted h₉₆(t) then the48 KHz FIR coefficients, denoted h₄₈(t) could be derived fromLowPass[h₉₆(t)], where the notation here is intended to signify that theoriginal impulse response h₉₆(t) is LowPass filtered. However, in theimproved method of FIG. 11, if the desired response is denoted h₉₆(t)and the delayed Impulse Response denoted A.δ(t−τ) then the 48 KHz FIRcoefficients denoted h₄₈(t) can be derived fromLowPass[h₉₆(t)−A.δ(t−τ)]. The choice of the delay factor τ and the gainfactor A are made so that the resulting signal from the gain element 178has the correct time of arrival and amplitude to produce thehigh-frequency components that are required in the direct-arrival partof the 96 KHz acoustic impulse response. Also instead of using a Delayand Gain arrangement, it would be possible to use a sparse FIR to createmultiple wide-band and frequency shaped echoes.

Hence, it can be seen that the preferred embodiment provides for areduced computation convolution system maintaining a substantial numberof the characteristics of the full convolution system.

The preferred embodiments takes a multi-channel digital input signal orsurround sound input signal such as Dolby Prologic, Dolby Digital (AC-3)and DTS, and uses one or more sets of headphones for output. The inputsignal is binaurally processed utilizing the abovementioned technique soas to improve listening experiences through the headphones on a widevariety of source material thereby making it sound “out of head” or toprovide for increased surround sound listening.

Given such a processing technique to produce an out of head effect, asystem for undertaking processing can be provided utilising a number ofdifferent embodiments. For example, many different possible physicalembodiments are possible and the end result can be implemented utilisingeither analog or digital signal processing techniques or a combinationof both.

In a purely digital implementation, the input data is assumed to beobtained in digital time-sampled form. If the embodiment is implementedas part of a digital audio device such as compact disc (CD), MiniDisc,digital video disc (DVD) or digital audio tape (DAT), the input datawill already be available in this form. If the unit is implemented as aphysical device in its own right, it may include a digital receiver(SPDIF or similar, either optical or electrical). If the invention isimplemented such that only an analog input signal is available, thisanalog signal must be digitised using an analog to digital converter(ADC).

This digital input signal is then processed by a digital signalprocessor (DSP) of some form. Examples of DSPs that could be used are:

1. A semi-custom or full-custom integrated circuit designed as a DSPdedicated to the task.

2. A programmable DSP chip, for example the Motorola DSP56002.

3. One or more programmable logic devices.

In the case where the embodiment is to be used with a specific set ofheadphones, filtering of the impulse response functions may be appliedto compensate for any unwanted frequency response characteristics ofthose headphones.

After processing, the stereo digital output signals are converted toanalog signals using digital to analog converters (DAC), amplified ifnecessary, and routed to the stereo headphone outputs, perhaps via othercircuitry. This final stage may take place either inside the audiodevice in the case that an embodiment is built-in, or as part of theseparate device should an embodiment be implemented as such.

The ADC and/or DAC may also be incorporated onto the same integratedcircuit as the processor. An embodiment could also be implemented sothat some or all of the processing is done in the analog domain.Embodiments preferably have some method of switching the “binauraliser”effect on and off and may incorporate a method of switching betweenequaliser settings for different sets of headphones or controlling othervariations in the processing performed, including, perhaps, outputvolume.

In a first embodiment illustrated in FIG. 12, the processing steps areincorporated into a portable CD or DVD player as a replacement for askip protection IC. Many currently available CD players incorporate a“skip-protection” feature which buffers data read off the CD in randomaccess memory (RAM). If a “skip” is detected, that is, the audio streamis interrupted by the mechanism of the unit being bumped off track, theunit can reread data from the CD while playing data from the RAM. Thisskip protection is often implemented as a dedicated DSP, either with RAMon-chip or off-chip.

This embodiment is implemented such that it can be used as a replacementfor the skip protection processor with a minimum of change to existingdesigns. This implementation can most probably be implemented as afull-custom integrated circuit, fulfilling the function of both existingskip protection processors and implementation of the out of headprocessing. A part of the RAM already included for skip protection couldbe used to run the out of head algorithm for HRTF-type processing. Manyof the building blocks of a skip protection processor would also beuseful in for the processing described for this invention. An example ofsuch an arrangement is illustrated in FIG. 12.

In this example embodiment, the custom DSP 200 is provided as areplacement for the skip protection DSP inside a CD or DVD player 202.The custom DSP 200 takes the input data from the disc and outputs stereosignals to a digital analogue converter 201 which provides analogueoutputs which are amplified 204, 205 for providing left and rightspeaker outputs. The custom DSP can include onboard RAM 206 oralternatively external RAM 207 in accordance with requirements. Abinauralizer switch 28 can be provided for turning on and off thebinauralizer effect.

In a second embodiment illustrated in FIG. 13 the processing isincorporated into a digital audio device 210 (such as a CD, MiniDisc,DVD or DAT player) as a replacement for the DAC. In this implementationthe signal processing is performed by a dedicated integrated circuit 211incorporating a DAC. This can easily be incorporated into a digitalaudio device with only minor modifications to existing designs as theintegrated circuit can be virtually pin compatible with existing DACs.

The custom IC 211 includes an onboard DSP core 212 and the normaldigital analogue conversion facility 213. The custom IC takes the normaldigital data output and performs the processing via DSP 212 and digitalto analogue conversion 213 so as to provide for stereo outputs. Again, abinauralizer switch 214 can be provided to control binauralizationeffects is required.

In a third embodiment, illustrated in FIG. 14, the processing isincorporated into a digital audio device 220 (such as a CD, MiniDisc,DVD or DAT player) as an extra stage 221 in the digital signal chain. Inthis implementation the signal processing is performed by either adedicated or programmable DSP 221 mounted inside a digital audio deviceand inserted into the stereo digital signal chain before the DAC 222.

In a fourth embodiment, illustrated in FIG. 15, the processing isincorporated into an audio device (such as a personal cassette player orstereo radio receiver 230) as an extra stage in the analog signal chain.This embodiment uses an ADC 232 to make use of the analog input signals.This embodiment can most likely be fabricated on a single integratedcircuit 231, incorporating a ADC 232, DSP 233 and DAC 234. It may alsoincorporate some analog processing. This could be easily added into theanalog signal chain in existing designs of cassette players and similardevices.

In a fifth embodiment, illustrated in FIG. 16, the processing isimplemented as an external device for use with stereo input in digitalform. The embodiment can be as a physical unit in its own right orintegrated into a set of headphones as described earlier. It can bebattery powered with the option to accept power from an external DCplugpack supply. The device takes digital stereo input in either opticalor electrical form as is available on some CD and DVD players orsimilar. Input formats can be SPDIF or similar and the unit may supportsurround sound formats such as Dolby Digital AC-3, or DTS. It may alsohave analog inputs as described below. Processing is performed by a DSPcore 241 inside custom IC 242. This is followed by a DAC 243. If thisDAC can not directly drive the headphones, additional amplifiers 246,247 are added after the DAC. This embodiment of the invention may beimplemented on a custom integrated circuit 242 incorporating DSP, DAC,and possibly headphone amplifier.

Alternatively, the embodiment can be implemented as a physical unit inits own right or integrated into a set of headphones. It can be batterypowered with the option to accept power from an external DC plugpacksupply. The device takes analog stereo input which is converted todigital data via an ADC. This data is then processed using a DSP andconverted back to analog via a DAC. Some or all of the processing mayinstead by performed in the analog domain. This implementation could befabricated onto a custom integrated circuit incorporating ADC, DSP, DACand possibly a headphone amplifier as well as any analog processingcircuitry required.

The embodiment may incorporate a distance or “zoom” control which allowsthe listener to vary the perceived distance or environment of the soundsource. In a preferred embodiment this control is implemented as aslider control. When this control is at its minimum the sound appears tocome from very close to the ears and may, in fact, be plainunbinauralised stereo. At this control's maximum setting the sound isperceived to come from a distance. The control can be varied betweenthese extremes to control the perceived “out-of-head”-ness of the sound.By starting the control in the minimum position and slider it towardsmaximum, the user will be able to adjust to the binaural experiencequicker than with a simple binaural on/off switch. Implementation ofsuch a control can comprise utilizing different sets of filter responsesfor different distances.

Example implementations having slider mechanisms are shown in FIG. 17.Alternatively, additional controls can be provided for switching audioenvironments.

As a further alternative, an embodiment could be implemented as genericintegrated circuit solution suiting a wide range of applicationsincluding those set out previously. This same integrated circuit couldbe incorporated into virtually any piece of audio equipment withheadphone output. It would also be the fundamental building block of anyphysical unit produced specifically as an implementation of theinvention. Such an integrated circuit would include some or all of ADC,DSP, DAC, memory I²S stereo digital audio input, S/PDIF digital audioinput, headphone amplifier as well as control pins to allow the deviceto operate in different modes (eg analog or digital input).

It would be appreciated by a person skilled in the art that numerousvariations and/or modifications may be made to the present invention asshown in the specific embodiments without departing from the spirit orscope of the invention as broadly described. The present embodimentsare, therefore, to be considered in all respects to be illustrative andnot restrictive.

We claim:
 1. A method of processing a series of input audio signalsrepresenting a series of virtual audio sound sources placed atpredetermined positions around a listener to produce a reduced set ofaudio output signals for playback over speaker devices placed around alistener, the method comprising the steps of: (a) for each of said inputaudio signals and for each of said audio output signals; (i) convolvingsaid input audio signals with an initial head portion of a correspondingimpulse response mapping substantially the initial sound and earlyreflections for an impulse response of a corresponding virtual audiosource to a corresponding speaker device so as to form a series ofinitial responses; (b) for each of said input audio signals and for eachof said audio output signals: (i) forming a combined mix from said audioinput signals; and (ii) determining a single convolution tail; (iii)convolving said combined mix with said single convolution tail to form acombined tail response; (c) for each of said audio output signals; (i)combining a corresponding series of initial responses and acorresponding combined tail response to form said audio output signal.2. A method as claimed in claim 1 wherein said single convolution tailis formed by combining the tails of said corresponding impulseresponses.
 3. A method as claimed in claim 1 further comprising the stepof: preprocessing said impulse response functions by: (a) constructing aset of corresponding impulse response functions; (b) dividing saidimpulse response functions into a number of segments; (c) for apredetermined number of said segments, reducing the impulse responsevalues at the ends of said segments.
 4. A method as claimed in claim 1wherein said input audio signals are translated into the frequencydomain and said convolution is carried out in the frequency domain.
 5. Amethod as claimed in claim 4 wherein said impulse response functions aresimplified in the frequency domain by zeroing higher frequencycoefficients and eliminating multiplication steps where said zeroedhigher frequency coefficients are utilized.
 6. A method as claimed inclaim 1 wherein said convolutions are carried out utilizing a lowlatency convolution process.
 7. A method as claimed in claim 6 whereinsaid low latency convolution process includes the steps of: transformingfirst predetermined overlapping block sized portions of said input audiosignals into corresponding frequency domain input coefficient blocks,transforming second predetermined block sized portions of said impulseresponses signals into corresponding frequency domain impulsecoefficient blocks; combining said each of said frequency domain inputcoefficient blocks with predetermined ones of said correspondingfrequency domain impulse coefficient blocks in a predetermined manner toproduce combined output blocks; adding together predetermined ones ofsaid combined output blocks to produce frequency domain output responsesfor each of said audio output signals; transforming said frequencydomain output responses into corresponding time domain audio outputsignals; discarding part of said time domain audio output signals;outputting the remaining part of said time domain audio output signals.8. A method as claimed in claim 1 wherein said series of input audiosignals include a left front channel signal, a right front channelsignal, a front centre channel signal, a left back channel signal and aright back channel signal.
 9. A method as claimed in claim 1 whereinsaid audio output signals comprise left and right headphone outputsignals.
 10. A method as claimed in claim 1 wherein said method isperformed utilising a skip protection processor unit located inside aCD-ROM player unit.
 11. A method as claimed in claim 1 wherein saidmethod is performed utilising a dedicated integrated circuit comprisinga modified form of a digital to analog converter.
 12. A method asclaimed in claim 1 wherein said method is performed utilising adedicated or programmable Digital Signal Processor.
 13. A method asclaimed in claim 1 wherein said method is performed on analog inputs bya DSP processor interconnected between an Analog to Digital Converterand a Digital to Analog Converter.
 14. A method as claimed in claim 1wherein said method is performed on stereo output signals on aseparately detachable external device connected intermediate of a soundoutput signal generator and a pair of headphones, said sound outputsignals being output in a digital form for processing by said externaldevice.
 15. A method as claimed in claim 1 further comprising utilizinga variable control to alter the impulse response functions in apredetermined manner.
 16. A method as coined in claim 1 wherein saidsingle convolution tail is formed by selecting one impulse response fromthe set of tails of said corresponding impulse responses.
 17. A methodof processing an input audio signal representing a virtual audio soundsource placed at a predetermined position around a listener to produce areduced set of audio output signals for playback over speaker devicesplaced around a listener, the method comprising the steps of, for eachsaid speaker device: (a) converting the input audio signal to a lowersample rate, by a low-pass filtering and decimation process, to producea decimated input signal; (b) applying a filtering process to saiddecimated input signal, to produce a decimated filtered signal; (c)converting said decimated filtered signal to the original higher samplerate, by an interpolation and low-pass filtering process, to produce ahigh sample-rate filtered signal; (d) applying a sparse filteringprocess to said input audio signal, to produce a sparsely filtered audiosignal; (e) summing together said high sample-rate filtered signal andsaid sparsely filtered audio signal to produce an audio output signal;(f) outputting said audio output signal to said speaker device.
 18. Themethod of claim 17 wherein said sparse filtering process consists of asingle delay element and a gain function.
 19. The method of claim 18wherein said sparse filtering process consists of a delay line, withmultiple tapped audio signals taken from the delay line, each saidtapped audio signal scaled through a gain function, and the outputs ofsaid gain functions added to produce said a sparsely filtered audiosignal.
 20. A method for processing input audio signals representing aplurality of audio sound sources at corresponding positions relative toa listener to generate one or more output signals for presentation toconvey spatial impressions of the corresponding positions to thelistener, wherein for a respective output signal, the method comprises:generating a plurality of first filtered signals by applyingfrequency-domain representations of respective first filters tofrequency-domain representations of respective input audio signals;generating a second filtered signal by applying a frequency domainrepresentation of a second filter to a mix of the frequency-domainrepresentations of the input audio signals, wherein one or morehigh-frequency coefficients of the frequency-domain representation ofthe second filter and of the mix of frequency-domain representations ofthe input audio signals are excluded from the applying; and generatingthe respective output signal by combining the first filtered signals andthe second filtered signal.
 21. A method according to claim 20, whereinone or more high-frequency coefficients of the frequency-domainrepresentations of respective first filters and of the frequency-domainrepresentations of respective input audio signals are excluded from theapplying.
 22. A method according to claim 20, wherein the first filterscorrespond to head portions of respective impulse responses that conveyspatial impressions of the corresponding positions to the listener andthe second filter corresponds to a combination of tail portions of therespective impulse responses.